social category
ChatGPT's Horny Era Could Be Its Stickiest Yet
ChatGPT's Horny Era Could Be Its Stickiest Yet OpenAI will soon let adults create erotic content in ChatGPT. Experts say that could lead to "emotional commodification," or horniness as a revenue stream. In May of 2024, while I was combing through OpenAI's "Model Spec" laying out how ChatGPT should act, one comment buried in the document struck me as peculiar. It said OpenAI was "exploring" how to let adult ChatGPT users generate content with mature themes such as "erotica, extreme gore, slurs, and unsolicited profanity." Seems like the exploration phase is over.
A Taxonomy of Stereotype Content in Large Language Models
Nicolas, Gandalf, Caliskan, Aylin
This study introduces a taxonomy of stereotype content in contemporary large language models (LLMs). We prompt ChatGPT 3.5, Llama 3, and Mixtral 8x7B, three powerful and widely used LLMs, for the characteristics associated with 87 social categories (e.g., gender, race, occupations). We identify 14 stereotype dimensions (e.g., Morality, Ability, Health, Beliefs, Emotions), accounting for ~90% of LLM stereotype associations. Warmth and Competence facets were the most frequent content, but all other dimensions were significantly prevalent. Stereotypes were more positive in LLMs (vs. humans), but there was significant variability across categories and dimensions. Finally, the taxonomy predicted the LLMs' internal evaluations of social categories (e.g., how positively/negatively the categories were represented), supporting the relevance of a multidimensional taxonomy for characterizing LLM stereotypes. Our findings suggest that high-dimensional human stereotypes are reflected in LLMs and must be considered in AI auditing and debiasing to minimize unidentified harms from reliance in low-dimensional views of bias in LLMs.
Characterizing Stereotypical Bias from Privacy-preserving Pre-Training
Arnold, Stefan, Gröbner, Rene, Schreiner, Annika
Differential Privacy (DP) can be applied to raw text by exploiting the spatial arrangement of words in an embedding space. We investigate the implications of such text privatization on Language Models (LMs) and their tendency towards stereotypical associations. Since previous studies documented that linguistic proficiency correlates with stereotypical bias, one could assume that techniques for text privatization, which are known to degrade language modeling capabilities, would cancel out undesirable biases. By testing BERT models trained on texts containing biased statements primed with varying degrees of privacy, our study reveals that while stereotypical bias generally diminishes when privacy is tightened, text privatization does not uniformly equate to diminishing bias across all social domains. This highlights the need for careful diagnosis of bias in LMs that undergo text privatization.
Analyzing Social Biases in Japanese Large Language Models
Yanaka, Hitomi, Han, Namgi, Kumon, Ryoma, Lu, Jie, Takeshita, Masashi, Sekizawa, Ryo, Kato, Taisei, Arai, Hiromi
BBQ (Parrish et al., 2022) is a Question Answering (QA) dataset to assess With the development of Large Language Models whether models can correctly understand the context (LLMs) across languages, there is a growing interest of various social categories, and is widely in the extent to which models exhibit social used to evaluate social biases in LLMs. We describe biases against diverse categories. Various social the details of BBQ in Section 3. CrowS-bias benchmarks have been provided (Rudinger Pairs (Nangia et al., 2020) is a dataset for analyzing et al., 2018; Zhao et al., 2018; Nangia et al., 2020; the social biases of masked language models Li et al., 2020; Nadeem et al., 2021; Dhamala et al., with fill-in-the-blank questions about social categories.
Debiasing Word Embeddings with Nonlinear Geometry
Cheng, Lu, Kim, Nayoung, Liu, Huan
Debiasing word embeddings has been largely limited to individual and independent social categories. However, real-world corpora typically present multiple social categories that possibly correlate or intersect with each other. For instance, "hair weaves" is stereotypically associated with African American females, but neither African American nor females alone. Therefore, this work studies biases associated with multiple social categories: joint biases induced by the union of different categories and intersectional biases that do not overlap with the biases of the constituent categories. We first empirically observe that individual biases intersect non-trivially (i.e., over a one-dimensional subspace). Drawing from the intersectional theory in social science and the linguistic theory, we then construct an intersectional subspace to debias for multiple social categories using the nonlinear geometry of individual biases. Empirical evaluations corroborate the efficacy of our approach. Data and implementation code can be downloaded at https://github.com/GitHubLuCheng/Implementation-of-JoSEC-COLING-22.
Impact Remediation: Optimal Interventions to Reduce Inequality
Bynum, Lucius E. J., Loftus, Joshua R., Stoyanovich, Julia
A significant body of research in the data sciences considers unfair discrimination against social categories such as race or gender that could occur or be amplified as a result of algorithmic decisions. Simultaneously, real-world disparities continue to exist, even before algorithmic decisions are made. In this work, we draw on insights from the social sciences and humanistic studies brought into the realm of causal modeling and constrained optimization, and develop a novel algorithmic framework for tackling pre-existing real-world disparities. The purpose of our framework, which we call the "impact remediation framework," is to measure real-world disparities and discover the optimal intervention policies that could help improve equity or access to opportunity for those who are underserved with respect to an outcome of interest. We develop a disaggregated approach to tackling pre-existing disparities that relaxes the typical set of assumptions required for the use of social categories in structural causal models. Our approach flexibly incorporates counterfactuals and is compatible with various ontological assumptions about the nature of social categories. We demonstrate impact remediation with a real-world case study and compare our disaggregated approach to an existing state-of-the-art approach, comparing its structure and resulting policy recommendations. In contrast to most work on optimal policy learning, we explore disparity reduction itself as an objective, explicitly focusing the power of algorithms on reducing inequality.